Skip to content

Comments

On-the-fly filtering for KNN search#1

Open
sanikolaev wants to merge 1 commit intomasterfrom
issue-4103-knn-filtering
Open

On-the-fly filtering for KNN search#1
sanikolaev wants to merge 1 commit intomasterfrom
issue-4103-knn-filtering

Conversation

@sanikolaev
Copy link
Collaborator

This PR implements on-the-fly filtering for KNN search, allowing the search algorithm to continue exploring until k filtered candidates are found, rather than finding k total candidates and then filtering them.

Key changes:

  • Added k parameter to searchBaseLayerST() method to specify the target number of filtered results when filtering is enabled
  • Added use_filter flag to determine when filtering mode is active
  • Modified termination condition to continue searching until k filtered candidates are found when use_filter is true, instead of stopping after ef total candidates
  • Updated exploration logic (should_explore) to continue exploring neighbors when filtering is enabled and fewer than k filtered candidates have been found, even if ef candidates have already been explored
  • Added filtering check (is_filtered) before adding candidates to top_candidates priority queue, ensuring only filtered candidates are stored
  • The filter callback (BaseFilterFunctor* isIdAllowed) is invoked for each candidate node via (*isIdAllowed)(label) to determine if it passes the filter
  • Updated searchKnn() to pass the k parameter to searchBaseLayerST() when a filter is provided

The changes ensure that when filtering is enabled:

  1. The search continues until k filtered candidates are found (if they exist)
  2. Only candidates that pass the filter are added to top_candidates
  3. The exploration continues even if ef candidates have been explored, as long as fewer than k filtered candidates have been found

This enables more accurate KNN search results when combined with attribute filters, as the algorithm actively searches for filtered candidates rather than relying on post-filtering which may return fewer than k results.

This commit implements on-the-fly filtering for KNN search, allowing the
search algorithm to continue exploring until k filtered candidates are
found, rather than finding k total candidates and then filtering them.

Key changes:
- Added `k` parameter to `searchBaseLayerST()` method to specify the target
  number of filtered results when filtering is enabled
- Added `use_filter` flag to determine when filtering mode is active
- Modified termination condition to continue searching until `k` filtered
  candidates are found when `use_filter` is true, instead of stopping after
  `ef` total candidates
- Updated exploration logic (`should_explore`) to continue exploring neighbors
  when filtering is enabled and fewer than `k` filtered candidates have been
  found, even if `ef` candidates have already been explored
- Added filtering check (`is_filtered`) before adding candidates to
  `top_candidates` priority queue, ensuring only filtered candidates are stored
- The filter callback (`BaseFilterFunctor* isIdAllowed`) is invoked for each
  candidate node via `(*isIdAllowed)(label)` to determine if it passes the filter
- Updated `searchKnn()` to pass the `k` parameter to `searchBaseLayerST()`
  when a filter is provided

The changes ensure that when filtering is enabled:
1. The search continues until `k` filtered candidates are found (if they exist)
2. Only candidates that pass the filter are added to `top_candidates`
3. The exploration continues even if `ef` candidates have been explored, as
   long as fewer than `k` filtered candidates have been found

This enables more accurate KNN search results when combined with attribute
filters, as the algorithm actively searches for filtered candidates rather
than relying on post-filtering which may return fewer than `k` results.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant